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Abstract 



One of the most interesting unsolved questions in science today is the question of hfe on other 
planets. At the present time it is safe to say that we do not have much of an idea as to whether 
Mh' life is common or exceedingly rare in the universe, and this will probably not be solved for certain 

unless definitive evidence of extraterrestrial life is found in the future. Our presence on Earth is 
just as consistent with the hypothesis that life is extremely rare as it is with the hypothesis that 
it is common, since if there was only one planet with intelligent life, we would find ourselves on 
it. However, we have more information than this, such as the the surprisingly short length of time 
►^.^^i it took for life to arise on Earth. Previous authors have analysed this information, concluding 

(~| ' that it is evidence that the probability of abiogenesis is moderate (> 13% with 95% probability) 

CIh. and cannot be extremely small. In this paper I use simple probabilistic model to show that this 

conclusion was based more on an unintentional assumption than on the data. While the early 
formation of life on Earth provides some evidence in the direction of life being common, it is 
^ ' far from conclusive, and in particular does not rule out the possibility that abiogenesis has only 

0^ , occurred once in the history of the universe. 

a^ 

■^. ! 1 Introduction 

^D I Attempting to make predictions about life elsewhere based on observations about Earth is in- 

herently difficu lt due to the sample size of 1 . It is also fraught with controversial "anthropic" 



~. I considerations ( Smolinl . l2004t l. However, there i s no reason in principle why it cannot be done. If 



we use probability theory to model uncertainty ( Javnesl 120031 ) . and data about life on Earth really 



is uninformative about extraterrestrial life, then probability theory will return wide probability 
distributions, indicating the large uncertainty. 

C^ I The surprising fact that life arose on Earth very quickly after its formation (e.g. iMojzsis et ai 

19961 1 and at the end of a likely phase of steril isation due to freque n t impacts, h as been used to 



argue that abiogenesis must therefore be easy. iLineweaver fc David (2002,l2Q0J, hereafter L&D) 



have modelled this reasoning with probability theory and concluded with 95% confidence (Bayesian 
posterior probability) that the probability of abiogenesis on an Earth-like planet is greater than 
13%. This was done by using a model where there was constant hazard (chance of life arising 
per discrete time interval) q. The probability distribution for the time t^ (our t^ corresponds to 
L&D's Aibiogcncsis) at which life arises depends on q, and this is also calculated conditional on the 
fact that ^L must be less than the age of the Earth, to correct for the fact that we couldn't have 
observed the Earth unless life began. This probability distribution is a likelihood function for q 
when the actual observed II is substituted into it. Combined with a prior distribution for q, we 
can then make inferences about its value. 

Whilst it is possible and interesting to calculate such things, the model used by L&D contains 
a fiaw that renders the conclusion invalid. Unfortunately, the conclusion quoted above depends 



on a choice of prior distribution over q that is overconfident and unreahstic as a description of our 
state of knowledge about abiogenesis. While uniform priors representing "initial ignorance" are 
common in Bayesi an Ana l ysis, a uniform prior for an unknown probability such as q is actually 
quite informative ( Javnesl l2003l chapter 18), assigning most of its probability to moderate values 



of q, and ignoring the possibility of extreme values. That the uniform prior is inappropriate can be 
illustrated using the technique of elaboration: are we happy with all of the implied consequences 
of assuming this prior distribution? For instance, one implication is that q G [0.49, 0.51] is just 
as plausible as g e [0,0.02], whereas if we are ignorant, possibilities such a,s q ^ 10^^ and so on 
should not be ignored as they almost are by the uniform prior. A more realistic represe ntatio n of 
complete prior ignorance would be the improper Haldane prior oc [q{l — q)]^^ (Javnesl. 119681 ). or 



a modification that removes the singularities at g = and q — 1 and makes the prior proper. The 
Haldane prior corresponds to an improper uniform prior for the "logit" log[q/(l — g)], representing 
uncertainty not just about the exact value of q but also about its order of magnitude. This paper 
uses a model that bypasses direct use of q and deals with expected waiting times instead, although 
its conclusions can be i nterpreted in the LfcD fram ework as well. 

The conclusions of iLineweaver fc David ( 20021 ) have been criticised previously on the basis 



that no observer would ever see a recent abiogenesis due to the larg e number of inter mediate steps 
required between abiogenesis and the development of intelligent life ( Flambauml . 120031 ) . However, it 



is still possible to imagine life arising after 5 Gyr on a planet and intelligent observers discovering 
this at, say, t — 8 Gyrs. The fact that we are not in this sit uation could still be considered 
surprising, and therefore informative ( Lineweaver fc Davisll2003r ). 



2 The Model 

Suppose that there existed a planet that is identical with the early Earth (when conditions have 
settled down to be suited for life; call this time i = 0) in terms of all of its macroscopic parameters: 
mass, temperature, chemical composition, distance from its Sun (which is identical with our Sun), 
etc. Of course, this model only applies to planets that are Earthlike in terms of their biological 
characteristics. While this may seem restrictive, observations about what actually occurred on 
Earth cannot be relevant to planets that do not have this property. Imagine we are given the value 
of a constant, /^, which is the expected waiting time for the first abiogenesis on a planet with the 
above conditions. From standard survival analysis, l//x is proportional to the probability per unit 
time of the event happening, and plays the same role as q in L&D's work. We are then informed, 
to our great surprise, that the following events occurred on the planet: 

- Proposition S: At time t = tg (the present time. Henceforth, a value of 4.3 Gyr is adopted 
whenever a specific value is required), there exists a person called Brcndon James Brewer, and the 
Prime Minister of Australia on the planet is Kevin Rudd. 

- Life first arose on the planet at a time t^- Obviously, tL <to- 

While proposition S may seem overly specific, one is more likely to make correct infer ences 



by con ditioning on a statement that is more specific than, say, "intelligent life arises". See iNeal 



( 20061) for a detailed discussion of this point and a principled framework for the treatment anthropic 



selection effects in general. 

Our predictions will be given in the form of probability distributions for all of these param- 
eters. The probability d istributions ar e chosen to represent our uncertain state of knowledge - 
the Bayesian framework (| Javnesl l2003| l. Throughout this paper, probabilities of propositions are 



denoted by an upper case P() and probability density functions (PDFs) for variables by a lower 
case p{); this notation allows probability expressions to become very succinct as the rules followed 
by probabilities and PDFs are written in the same way. 

3 Sampling Distribution 

If we only knew the abiogenesis timescale fi, our prediction for t^ would be described by an 
exponential distribution: 

p(iL|A*) = - cxp i-tL/^i) tL>0 (1) 
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Figure 1: The probability density for the time at which abiogenesis occured, given that we exist at 
t=4.3 Gyrs after the Earth was first suitable for life (defined as t=0). Note that as the abiogenesis 
timescale becomes larger, this distribution becomes uniform. 



Note that this is not an assumption about any frequency distribution that would occur in a popu- 
lation of Earths, it is o nly the most conservative probability distribution that has the expectation 
value /x (jJavnesl 119791 ). When we find out that S is true for the planet we are watching, the 
distribution is revised to be truncated to between t = and t = to- 
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Technically, this should have been calculated from Bayes' theorem: 
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where the first term in the numerator would come from Equation [TJ The other term would be 
very difficult to quantify, however, any effects that they would include apart from the obvious 
truncation effect (S cannot be true unless tr, < to) would likely be sim ply quantitative versions 
of the evidence and arguments discussed bv iLineweaver fc DavisI ( 20031 ). For example, the fact 
that it is very unlikely for S to be true if t^ is close to to corresponds to the "non-observability of 
recent abiogenesis" and would be modelled in the factor p{S\tL, fi). Another possible effect is that 
there are various epochs in any Earth- like planet's history, and conditions are suitable for life to 
arise in only one of those epochs. However, for the purposes of this paper, the simple truncation 
of Equation [2] is sufficient to repeat most of L&D's argument, while highlighting our point of 
disagreement with it. Any attempt to increase the sophistication of the model will be deferred to 
future work. 

The sampling distribution (Equation [2]) for data given parameters is plotted in Figure [T] for 
three different values of the abiogenesis waiting timescale /.t: 0.3, 1 and 10 Gyr. Any proposed 
value of /i is a distinct hypothesis that we wish to test, and this sampling distribution defines 
the predictions that each hypothesis makes about the observational data i^, the actual time that 
abiogenesis occurred. Note that as /i increases, this tends to a uniform distribution, and hence 
moderately large values of fi (~ 10 Gyr) and extremely large values of fi make exactly the same 
predictions about t^. 

Thus far, this model is virtually identical to that of L&D - the only difference is that L&D 
used a discretised time axis with Ai=200 Myr and parameterised /i by g « /i^^Ai, the probability 



of life arising in a time At. This makes it difficult to see how they could have extracted such 
confident conclusions about q based on t^, in light of the above paragraph. This question will be 
explored in the next section. 

4 Inference About fi 

We have a sampling distribution for some data given a parameter of interest, in Equation [21 To 
infer the parameter /i from the knowrl^ value of t^, we use Bayes' Theorem to get the posterior 
distribution for /i, which is proportional to a prior distribution times the likelihood function from 
Equation [51 

p{fi\tL,S) oc p{fi\S)p{tL\H: S) ^ p{fi)p{tL\fJ., S) (6) 

Since S by itself hardly tells us anything about any abiogenesis except that it is possible, the 
dependence on S was dropped from the prior. Now, before we can get probabilistic conclusions 
about ^ or a related quantity such as q, a prior must be assigned. If we are initially ignorant of /i, 
a suitable prior is the Jeffreys prior ex l//z. The reason for this is that it is equivalent to a uniform 
improper prior for log(/i), and hence describes uncertainty about the order of magnitude of the 
parameter. Alternatively, it is the only prior that is invariant under a change of timescale: if we 
were to find that we are measuring ^ in Terayears rather than Gigayears, the Jeffreys prior is the 
only choice that would not change in the newly rescaled problem. With this choice, the posterior 
distribution for fj, cannot be normalised unless we obtain additional information about an upper 
limit to /i. Hence, it is impossible to construct credible intervals from this data. All we can do is 
plot the improper posterior for logio(/i) (which is basically the likelihood, since a Jeffreys prior is 
uniform for login (/i)), and this is displayed as the solid curve in Figure [21 There is a peak in the 
posterior, indicating that there is indeed evidence favouring a particular value for fi of about t^. 
However, the likelihood flattens out at a non-negative (and non-negligible) value after about t —2 
Gyr. Thus, this data cannot rule out the hypothesis that fi is enormous and that Earth hosted 
the only abiogenesis event(s) in the universe. 

This is essentially a quantitative version of an argument that has been put forward previously, 
(e.g. bv lHansonl . 1 1998[ l : "Since no one on Earth would be wondering about the origin of life if Earth 



did not contain creatures nearly as intelligent as ourselves, the fact that four billion years elapsed 
before high intelligence appeared on Earth seems compatible with any expected time longer than 
a few billion years" . 

Now, what prior did L&D implicitly assume? To answer this, a uniform prior for q must be 
translated to a prior for fi via the approximate relationship q « /i^^Ai. Since q '^ Uniform(0, 1), 
q/At = ^~^ ^ Uniform(0, 1/At). By the usual rule for transforming probability distributions: 



pi^i)dfi=p{p-')d{p-') (7) 

dfi 



= p(^-i)^(^d^ (8) 



= (At) X (-M^^) dfi, fi £ [At, oo] (9) 

The negative sign is irrelevant because only the absolute value of the Jacobian matters, so this 
negative result simply measures the decrease in accumulated probability as one moves leftwards 
along the /i axis - so nothing is amiss. It is apparent that the choice of a uniform prior for q is 
equivalent to a prior for /i that is proportional to /i~^, and truncated to values of jj, greater than 
At. This may seem innocuous, but it is significant enough to make the posterior normalisable - in 
fact, whereas the likelihood function (and posterior wrt a Jeffreys prior) flattens out completely 
for high /x, the posterior wrt the L&D prior decays exponentially in that region. The major effect 
of this choice of prior on the posterior can be seen easily in Figure [2 

Unfortunately, unless definitive independent evidence can be found that puts an upper limit 
on possible values of /i, meaningful credible intervals cannot be constructed. L&D appear to 
have unwittingly assumed that they did have that extra required information, or that the early 
formation of life on Earth could provide it, but unfortunately this is not the case. Some information 



^t_L is not known exactly, of course. A value of 250 Myr will be adopted whenever a definite value is required. 
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Figure 2: The posterior probability density for the logarithm of the abiogenesis timescale, assuming 
a Jeffreys prior for the timescale (uniform prior for its logarithm), is plotted here as the solid curve 
in the left panel. The prior that is implied by a uniform prior for the quantity q (the chance of life 
arising in a finite time interval) is shown as a dotted curve in the right hand panel, along with the 
resultant posterior. Note that the likelihood function (proportional to our posterior) becomes flat at 
a nonzero value towards the right of the curve. Hence, whilst the data do support the hypothesis 
that abiogenesis is likely on Earth-like planets (due to the likelihood peak), it is not a strong enough 
constraint to rule out more 'pessimistic' possibilities. 



that could provide a likelihood function that allowed the posterior to be normalised would be the 
following: 

- Detection of life elsewhere. Since it is possible to observe a lack of life elsewhere, the sampling 
distribution of Equation [5] would no longer integrate to 1, and would be truncated at the star's 
lifetime rather than having anything to do with the age of the Earth. This models some (quite 
high, in the case of large /i) probability that life will not arise at all on a given planet. It was 
anthropic considerations that led to the truncation and renormalisation, and these do not apply 
to the case of life on other planets. 

- A very compelling and well understood theory of abiogenesis would enable the direct calcu- 
lation of fi from first principles; in theory at least. 

These two possibilities, while not exhaustive, would allow definite inferences about /i that the 
current data do not. This conclusion accords with the common sense attitute that prevails in the 
scientific community about what we know and don't know about the probability of abiogenesis. 



5 Conclusion 

The fact that life arose surprisingly early after the formation of the Earth can be used as evidence 
for the hypothesis that abiogenesis is easy, and hence supports the conclusion that life is common 
in the universe. However, the evidence is not as conclusive as has been claimed. Specifically, 
this study has highlighted the fact that knowledge of the early abiogenesis time on Earth is still 
compatible with the following hypothesis: that life is extraordinarily rare in the universe, perhaps 
even only on Earth, and we observe early abiogenesis du e to chance (we'd have to b e moderately 
lucky, but not obscenely so). This conclusion differs from lLineweaver fc David ( 2002 ) because they 
unwittingly made overconfident prior assumptions. Hence, unless there is a direct detection, the 
answer to the perennial question "are we alone" remains "nobody knows" . 
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